Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
During X-ray computed tomography (CT) scanning, metallic implants carrying with patients often lead to adverse artifacts in the captured CT images and then impair the clinical treatment. Against this metal artifact reduction (MAR) task, the existing deep-learning-based methods have gained promising reconstruction performance. Nevertheless, there is still some room for further improvement of MAR performance and generalization ability, since some important prior knowledge underlying this specific task has not been fully exploited. Hereby, in this paper, we carefully analyze the characteristics of metal artifacts and propose an orientation-shared convolution representation strategy to adapt the physical prior structures of artifacts, i.e., rotationally symmetrical streaking patterns. The proposed method rationally adopts Fourier-series-expansion-based filter parametrization in artifact modeling, which can better separate artifacts from anatomical tissues and boost the model generalizability. Comprehensive experiments executed on synthesized and clinical datasets show the superiority of our method in detail preservation beyond the current representative MAR methods. Code will be available at \url{https://github.com/hongwang01/OSCNet}
translated by 谷歌翻译
Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.
translated by 谷歌翻译
The increasing privacy concerns on personal private text data promote the development of federated learning (FL) in recent years. However, the existing studies on applying FL in NLP are not suitable to coordinate participants with heterogeneous or private learning objectives. In this study, we further broaden the application scope of FL in NLP by proposing an Assign-Then-Contrast (denoted as ATC) framework, which enables clients with heterogeneous NLP tasks to construct an FL course and learn useful knowledge from each other. Specifically, the clients are suggested to first perform local training with the unified tasks assigned by the server rather than using their own learning objectives, which is called the Assign training stage. After that, in the Contrast training stage, clients train with different local learning objectives and exchange knowledge with other clients who contribute consistent and useful model updates. We conduct extensive experiments on six widely-used datasets covering both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks, and the proposed ATC framework achieves significant improvements compared with various baseline methods. The source code is available at \url{https://github.com/alibaba/FederatedScope/tree/master/federatedscope/nlp/hetero_tasks}.
translated by 谷歌翻译
胸部X射线(CXR)中准确的异常定位可以使各种胸部疾病的临床诊断受益。但是,病变水平的注释只能由经验丰富的放射科医生进行,这是乏味且耗时的,因此很难获得。这种情况导致难以开发CXR的完全监督异常定位系统。在这方面,我们建议通过一个弱半监督的策略来训练CXR异常本地化框架,称为“超越阶级”(PBC),该策略(PBC)使用了少数带有病变级别边界框的完全注释的CXR,并通过广泛的弱化的样品和大量的带有注释的样品。点。这样的点注释设置可以通过边缘注释成本提供弱实例级信息,以实现异常定位。尤其是,我们的PBC背后的核心思想是学习从点注释到边界框的强大而准确的映射,以根据注释点的差异。为此,提出了一个正则化项,即多点的一致性,它驱动模型从相同异常内的不同点注释中生成一致的边界框。此外,还提出了一种被称为对称的一致性的自学,也提出了从弱注释的数据中深入利用有用的信息来实现异常定位。 RSNA和VINDR-CXR数据集的实验结果证明了该方法的有效性。当使用少于20%的盒子级标签进行训练时,与当前的最新方法相比,我们的PBC可以在MAP中提高〜5的改进(即点DETR)。代码可从https://github.com/haozheliu-st/point-beyond-class获得。
translated by 谷歌翻译
近年来,生成的对抗网络(GAN)在各种任务和应用中都显示出了令人信服的结果。但是,模式崩溃仍然是gan的关键问题。在本文中,我们提出了一条新型的培训管道,以解决甘恩斯的模式崩溃问题。与现有方法不同,我们建议将鉴别器概括为特征嵌入,并最大程度地提高鉴别器学到的嵌入空间中分布的熵。具体而言,两个正则化术语,即深度局部线性嵌入(DLLE)和深度等距特征映射(疾病),旨在鼓励歧视者学习嵌​​入数据中的结构信息,以便可以是歧视器所学的嵌入空间,可以是可以得到的。形成良好。基于鉴别器支持的良好学习嵌入空间,非参数熵估计量旨在有效地最大化嵌入向量的熵,以最大化生成分布的熵的近似值。通过改善鉴别器并最大化嵌入空间中最相似的样品的距离,我们的管道可有效地减少模式崩溃的情况,而无需牺牲生成的样品的质量。广泛的实验结果表明,我们的方法的有效性超过了GAN基线,MAF-GAN在Celeba上(9.13 vs. 12.43),超过了最新的基于动漫的能量模型(Anime-Face DataSet( 2.80 vs. 2.26的成立得分)。
translated by 谷歌翻译
为了调查现实世界中联邦学习的异质性,我们将经典的联合学习概括为联合的异性任务学习,这强调了参与者在数据分布和学习任务方面的联盟学习中的不一致性。我们还提出了B-FHTL,这是一种联合的杂项任务学习基准,该基准包括模拟数据集,FL协议和统一的评估机制。 B-FHTL数据集包含三个精心设计的联合学习任务,异质性增加。每个任务都使用不同的非IID数据和学习任务模拟客户端。为了确保不同的FL算法之间的公平比较,B-FHTL通过提供高级API来避免隐私泄漏,在整个FL协议中构建,并预设跨越不同的学习任务的最常见评估指标,例如回归,分类,文本,文本,文本此外,我们还比较了B-FHTL中联合多任务学习,联合个性化和联合元学习领域的FL算法,并突出了联盟异质任务学习的异质性和困难的影响。我们的基准测试,包括联合数据集,协议,评估机制和初步实验,可在https://github.com/alibaba/federatedscope/tree/master/master/master/benchmark/b-fhtl上开放。
translated by 谷歌翻译
从磁共振成像(MRI)中进行精确的脑肿瘤分割,对于多模式图像的联合学习是可取的。但是,在临床实践中,并非总是有可能获得一组完整的MRI,而缺失模态的问题会导致现有的多模式分割方法中的严重性能降解。在这项工作中,我们提出了第一次尝试利用变压器进行多模式脑肿瘤分割的尝试,该脑肿瘤分割对任何可用模式的任何组合子集都是可靠的。具体而言,我们提出了一种新型的多模式医疗变压器(MMMFORMER),用于不完整的多模式学习,具有三个主要成分:混合模态特异性的编码器,该编码器在每种模式中桥接卷积编码器和一个局部和全局上下文模型的模式内变压器;一种模式间变压器,用于建立和对齐模态跨模态的远程相关性,以对应于肿瘤区域的全局语义。一个解码器,与模态不变特征进行渐进的上采样和融合,以生成可靠的分割。此外,在编码器和解码器中都引入了辅助正规化器,以进一步增强模型对不完整方式的鲁棒性。我们对公共批评的大量实验$ 2018 $ $数据集用于脑肿瘤细分。结果表明,所提出的MMFORMER优于几乎所有不完整模态的亚群的多模式脑肿瘤分割的最新方法,尤其是在肿瘤分割的平均骰子中平均提高了19.07%,只有一种可用的模式。该代码可在https://github.com/yaozhang93/mmmenforer上找到。
translated by 谷歌翻译
受深神经网络的巨大成功的启发,基于学习的方法在计算机断层扫描(CT)图像中获得了有希望的金属伪像(MAR)的表现。但是,大多数现有方法更加强调建模并嵌入本特定MAR任务的内在先验知识中,将其纳入其网络设计中。在这个问题上,我们提出了一个自适应卷积词典网络(ACDNET),该网络利用基于模型的方法和基于学习的方法。具体而言,我们探讨了金属伪像的先前结构,例如非本地重复条纹模式,并将其编码为显式加权卷积词典模型。然后,仔细设计了一种简单的算法来解决模型。通过将所提出算法的每个迭代取代展开到网络模块中,我们将先前的结构明确嵌入到深网中,\ emph {i.e。,}对MAR任务的明确解释性。此外,我们的ACDNET可以通过训练数据自动学习无伪影CT图像的先验,并根据其内容自适应地调整每个输入CT图像的表示内核。因此,我们的方法继承了基于模型的方法的明确解释性,并保持了基于学习的方法的强大表示能力。在合成和临床数据集上执行的综合实验表明,在有效性和模型概括方面,我们的ACDNET的优越性。 {\ color {blue} {{\ textIt {代码可在{\ url {https://github.com/hongwang01/acdnet}}}}}}}}}}}}}}}}
translated by 谷歌翻译
联合学习(FL)的令人难以置信的发展使计算机视觉和自然语言处理领域的各种任务受益,而现有的TFF和FATE等现有框架使在现实应用程序中的部署变得容易。但是,即使图形数据很普遍,联合图形学习(FGL)由于其独特的特征和要求而没有得到很好的支持。缺乏与FGL相关的框架增加了完成可再现研究和在现实世界应用中部署的努力。在本文中,我们首先讨论了创建易于使用的FGL软件包的挑战,因此提出了我们实施的FederatedScope-GNN(FS-G)的包裹,该软件包提供了(1)统一的模块化视图并表达FGL算法; (2)用于开箱即用的FGL功能的综合数据和模型; (3)有效的模型自动调整组件; (4)现成的隐私攻击和防御能力。我们通过进行广泛的实验来验证FS-G的有效性,该实验同时获得了许多有关FGL的宝贵见解。此外,我们采用FS-G在现实世界中的电子商务方案中为FGL应用程序提供服务,在该场景中获得的改进表明了巨大的潜在业务利益。我们在https://github.com/alibaba/federatedscope上公开发布FS-G,作为FederatedScope的子模型,以促进FGL的研究,并启用由于缺乏专用包装而无法无视的广泛应用。
translated by 谷歌翻译